NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

gsplat: An Open-Source Library for Gaussian Splatting

Ye, Vickie; Li, Ruilong; Kerr, Justin; Turkulainen, Matias; Yi, Brent; Pan, Zhuoyang; Seiskari, Otto; Ye, Jianbo; Hu, Jeffrey; Tancik, Matthew; et al (February 2025, Journal of machine learning research)

gsplat is an open-source library designed for training and developing Gaussian Splat- ting methods. It features a front-end with Python bindings compatible with the Py- Torch library and a back-end with highly optimized CUDA kernels. gsplat o↵ers nu- merous features that enhance the optimization of Gaussian Splatting models, which in- clude optimization improvements for speed, memory, and convergence times. Experimen- tal results demonstrate that gsplat achieves up to 10% less training time and 4⇥ less memory than the original Kerbl et al. (2023) implementation. Utilized in several re- search projects, gsplat is actively maintained on GitHub. Source code is available at https://github.com/nerfstudio-project/gsplat under Apache License 2.0. We wel- come contributions from the open-source community.
more » « less
Free, publicly-accessible full text available February 1, 2026
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction

Kerr, Justin; Kim, Chung_Min; Wu, Mingxuan; Yi, Brent; Wang, Qianqian; Goldberg, Ken; Kanazawa, Angjoo (November 2024, Conference on Robot Learning)

Humans can learn to manipulate new objects by simply watching others; providing robots with the ability to learn from such demonstrations would enable a natural interface specifying new behaviors. This work develops Robot See Robot Do (RSRD), a method for imitating articulated object manipulation from a single monocular RGB human demonstration given a single static multi-view object scan. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video with differentiable rendering. This analysis-by-synthesis approach uses part-centric feature fields in an iterative optimization which enables the use of geometric regularizers to recover 3D motions from only a single video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. By representing demonstrations as part-centric trajectories, RSRD focuses on replicating the demonstration's intended behavior while considering the robot's own morphological limits, rather than attempting to reproduce the hand's motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot. Each phase of RSRD achieves an average of 87% success rate, for a total end-to-end success rate of 60% across 90 trials. Notably, this is accomplished using only feature fields distilled from large pretrained vision models -- without any task-specific training, fine-tuning, dataset collection, or annotation.
more » « less
Full Text Available
Incremental Learning of Structured Memory via Closed-Loop Transcription

Tong, Shengbang; Dai, Xili; Wu, Ziyang; Li, Mingyang; Yi, Brent; Ma, Yi (June 2023, International Conference on Learning Representations)

This work proposes a minimal computational model for learning structured memories of multiple object classes in an incremental setting. Our approach is based on establishing a closed-loop transcription between the classes and a corresponding set of subspaces, known as a linear discriminative representation, in a lowdimensional feature space. Our method is simpler than existing approaches for incremental learning, and more efficient in terms of model size, storage, and computation: it requires only a single, fixed-capacity autoencoding network with a feature space that is used for both discriminative and generative purposes. Network parameters are optimized simultaneously without architectural manipulations, by solving a constrained minimax game between the encoding and decoding maps over a single rate reduction-based objective. Experimental results show that our method can effectively alleviate catastrophic forgetting, achieving significantly better performance than prior work of generative replay on MNIST, CIFAR-10, and ImageNet-50, despite requiring fewer resources. Source code can be found at https://github.com/tsb0601/i-CTRL
more » « less
Full Text Available

Search for: All records